library(tidyverse)
results <- readRDS("../data/abm-diffusion3/week08_results_01.rds")
hist(results$df_final_summary$R_share_mean)More Diffusion Processes
Last week we explored simple diffusion processes (also known as simple contagion), where a single contact can trigger transmission. This week, we will deepen our understanding of this type of diffusion process by applying three complementary analysis methods to the model:
Sensitivity Analysis
How do small changes in key parameters (e.g., infection probability \(p\)) affect our outcomes?Robustness Analysis
How does the design of our simulation (e.g., update order, scheduling scheme) influence results?Dispersion Analysis
How do diffusion metrics (final adoption rate, time to saturation) vary across different runs or network structures?
You can download the complete code here.
🎯 Learning Goals
By the end of this chapter, you will be able to:
Differentiate the three major sources of uncertainty in agent-based models—input uncertainty, model uncertainty, and stochastic variability—and explain why each one demands a distinct diagnostic strategy.
Perform and interpret a sensitivity analysis (grid search and Latin Hypercube Sampling) that links parameter variation to aggregate outcomes, using heatmaps and regressions.
Learned something in general about design and execution of robustness checks.
The Problem of Uncertainty & Variability
Agent-based models offer strong internal validity: the modeler controls all assumptions and mechanisms, making it possible to trace aggregate outcomes back to specific design choices. When parameters are grounded in empirical evidence, agent-based models (ABMs) become powerful tools for testing how behavioral rules and environmental conditions shape system-level patterns (Bruch and Atwell 2015) .
But this strength can also be misleading. All conclusions are conditional on the structure of the model and the specific values used. What if a result changes when we slightly tweak a parameter? What if a minor change in model architecture (like the update order) flips the outcome? Or what if the result we observed was simply due to random chance in one simulation run?
These are not trivial concerns—they represent three distinct kinds of uncertainty that can lead us to overinterpret patterns that are actually artifacts based on a row of lucky (or unlucky) coincidences. This uncertainty in ABMs comes from three distinct sources, each calling for its own diagnostic lens:
| Source of variation | Typical origin | Diagnostic lens | Key question |
|---|---|---|---|
| Input uncertainty | Imperfect or noisy parameter estimates, sketchy data on the initial population | Uncertainty analysis, Sensitivity analysis |
Which parameters, and in which ranges, sway the outcome the most? |
| Model uncertainty | Architectural choices that are mathematically equivalent but computationally different (update order, precision, tie policies) | Robustness analysis | Does changing the structure of the code change the outcome? |
| Stochastic variability | If models are not deterministic but rely on probabilistic functions. | Dispersion analysis | If I run the same simulation twice, how far apart can results drift? |
A scientific explanation or prediction is only as credible as the uncertainty bands that surround it. Enumerating and quantifying the three layers—input uncertainty, model uncertainty, and stochastic variability—guards us against confusing mere model artifacts for substantive social insights.
Much of the taxonomy below is rooted in the risk-analysis literature McKay, Beckman, and Conover (1979).
Input uncertainty
Input (or epistemic) uncertainty originates in imperfect knowledge about
- point estimates of behavioral parameters (e.g., transmission probability \(p\)),
- initial conditions (e.g., spatial distribution of seeds), and
- structural data (e.g., the true degree distribution of the underlying network).
Two complementary diagnostics help us cope with that ignorance (Saltelli et al. 2008):
| Diagnostic lens | Core task | Typical output |
|---|---|---|
| Uncertainty analysis | Quantify total output variance induced by uncertain inputs | A distribution (histogram, density, credible band) of the outcome |
| Sensitivity analysis | Attribute portions of that variance to individual inputs or their interactions | Heatmaps, regression outputs, variance‐decomposition indices (Sobol) |
Before we dive into sensitivity analysis, let’s take a quick look at the uncertainty in our simulation results. As a first step, I’ve prepared some output data that we can load and visualize. Specifically, we’ll look at the distribution of R_share_mean—the average total death rate observed at the end of each simulation run.
What we see is a U-shaped distribution. This tells us something important: in many runs, the model predicts either a very low or a very high total death rate, with fewer simulations landing in the middle range. In other words, the model outcomes show substantial variability.
This variability is not just noise—it’s a window into the model’s behavior. Uncertainty analysis shows us that results vary. Sensitivity analysis is about figuring out why they vary.
So let’s now take the next step: let’s try to explain this variance by identifying which input parameters drive the differences we see in model outcomes.
Grid search (Brute Force)
To explore how the model behaves across a range of parameter values, we will start with the most simple method, that is, with a grid search. This is a brute-force method where we systematically test every combination of selected parameters.
In this example, we vary two key parameters:
transmit_prob: the probability that an infection spreads during contact,I_R_prob: the probability that a infectious individual dies.
We fix I_S_prob (the recovery probability) at a constant value of 0.5 and vary the other two parameters from 0 to 1 in steps of 0.1. This results in an \(11 \times 11 = 121\)-point grid. The data I showed you before came exactly from this simulation.
simulation <- list(
model = "all0",
replications = 15,
tbl_parameter = expand.grid(
transmit_prob = seq(0, 1, by=.1),
I_R_prob = seq(0, 1, by=.1),
I_S_prob = 0.5
),
master_seed = 42,
max_cores = 15
)
# results <- run_simulation(simulation)
results <- readRDS("../data/abm-diffusion3/week08_results_01.rds")After running the simulation, we summarize the results by plotting the average proportion of agents in the recovered state (R_share_mean) across replications for each parameter combination.
For this I prepared two plot function that can generate two types of heatmaps.
Show R Code (plot functions)
library(plotly)
get_type_colors <- function() {
c(S = "#a5d875", # Susceptible
I = "#eb6a6a", # Infectious
R = "#73bcde" # Removed (Immune)
)
}
plot_heatmap <- function(df,
x_var,
y_var,
fill_var,
color =c ("grey95", "grey50", "grey0")
) {
ggplot(df, aes(x = .data[[x_var]], y = .data[[y_var]], fill = .data[[fill_var]])) +
geom_tile(color = "white") +
scale_fill_gradient2(
low = color[1], mid = color[2], high = color[3],
midpoint = 0.5, limits = c(0, 1),
name = "R Share (mean)"
) +
labs(
title = paste("Heatmap of", fill_var),
x = x_var,
y = y_var
) +
theme_minimal(base_size = 14) +
theme(
panel.grid = element_blank(),
axis.text = element_text(color = "black")
)
}
plot_SIR_shares <- function(df,
x_var,
y_var) {
type_colors <- get_type_colors()
col_S <- col2rgb(type_colors["S"])
col_I <- col2rgb(type_colors["I"])
col_R <- col2rgb(type_colors["R"])
df_blended <- df |>
rowwise() |>
mutate(
r = S_share_mean * col_S[1,1] + I_share_mean * col_I[1,1] + R_share_mean * col_R[1,1],
g = S_share_mean * col_S[2,1] + I_share_mean * col_I[2,1] + R_share_mean * col_R[2,1],
b = S_share_mean * col_S[3,1] + I_share_mean * col_I[3,1] + R_share_mean * col_R[3,1],
fill_color = rgb(r / 255, g / 255, b / 255),
tooltip = paste0(
"<b>", x_var, ":</b> ", .data[[x_var]], "<br>",
"<b>", y_var, ":</b> ", .data[[y_var]], "<br>",
"<b>S:</b> ", round(S_share_mean, 3), "<br>",
"<b>I:</b> ", round(I_share_mean, 3), "<br>",
"<b>R:</b> ", round(R_share_mean, 3)
)
) |>
ungroup()
ggplot(df_blended, aes(x = .data[[x_var]], y = .data[[y_var]])) +
geom_tile(aes(fill = fill_color, text = tooltip),
color = "white",
show.legend = FALSE) +
scale_fill_identity() +
labs(
title = "SIR Proportion Heatmap",
x = x_var,
y = y_var
) +
theme_minimal(base_size = 14) +
theme(
panel.grid = element_blank(),
axis.text = element_text(color = "black")
)
}
plot_SIR_shares_interactive <- function(df, x_var, y_var) {
p <- plot_SIR_shares(df, x_var, y_var)
ggplotly(p, tooltip = "text") |> layout(showlegend = FALSE)
}Heatmaps are an excellent way to reveal patterns such as:
- Threshold effects — sharp changes in outcome when parameters cross a critical value,
- Interaction effects — how two parameters jointly shape the outcome,
- Stable zones — regions where outcomes remain largely unchanged across parameter values.
Let us take a look at our example:
plot_heatmap(df = results$df_final_summary,
x_var = "transmit_prob",
y_var = "I_R_prob",
fill_var = "R_share_mean")In this case, we observe:
- For
transmit_probof 0.3 and below, the removal rate stays low regardless ofI_R_prob. - For
transmit_probof 0.5 and above, removal rates are consistently high wheneverI_R_probis not zero. - A critical transition region appears between 0.3 and 0.5 in
transmit_prob, and 0.1 to 1.00 inI_R_prob, where small changes can result in large shifts in outcomes.
This pattern gives us practical insights: it tells us when the model outcome is sensitive to parameter changes and when it is not. That knowledge is incredibly useful for the planning of empirical research. If a model outcome remains stable across a wide range of values, we might only need a rough estimate of the corresponding parameter. But if small changes in a parameter lead to large differences in outcomes, we know that this parameter must be measured with greater precision. This allows researchers to prioritize data collection efforts where they are most impactful.
To go beyond just the average R_share_mean, we can visualize the full composition of the population—how much is Susceptible, Infectious, and Removed—by blending their typical colors. Each cell in the plot is shaded according to the mix of S, I, and R agents at the end of the simulation.
plot_SIR_shares_interactive(
df = results$df_final_summary,
x_var = "transmit_prob",
y_var = "I_R_prob"
)This plot is interactive! Hover your mouse over any cell to see exact values for S, I, and R. We used the plotly package to enhance this visualization.
This blended view tells us which agent type dominates across different parts of the parameter space:
- Green areas → mostly Susceptible (epidemic dies out),
- Red areas → high Infectious load (ongoing spread),
- Blue areas → mostly Removed (widespread death).
It’s a visually intuitive way to understand complex population dynamics and identify tipping points at a glance.
Running into limitations
In our earlier examples, we used tbl_parameter to define a small grid of selected parameter combinations. This worked well for visualizing how two parameters interact—but what happens when we want to explore more parameters, or cover them more finely? Let’s say you have three parameters, each ranging from 0 to 1, and you decide to evaluate them in increments of 0.05. That gives you 21 values per parameter. The total number of combinations therefore explodes to \(21^{3}=9,261\) combinations.
That’s 9,261 full simulation runs—and that’s before accounting for replications to estimate stochastic variability. Suddenly, the brute-force grid approach becomes a bottleneck. Despite its simplicity, grid search has some serious limitations:
Computational burden
Running thousands of simulations can be expensive in both time and processing power, especially if your model includes many agents, time steps, or complex interactions.Discretisation bias
By using fixed steps, you restrict yourself to a predefined grid and entirely exclude other values (e.g., 0.51111 or 0.387). If your model is non-linear or includes thresholds, important behavioral shifts could go undetected between grid points.Distributional mismatch
A regular grid assumes all values in the range are equally likely (a uniform distribution). But what if prior knowledge suggests the parameter is most likely around 0.3 with a standard deviation of 0.1? A uniform grid wastes resources evaluating parameter combinations that are implausible or even irrelevant.
To fix the third problem, we can transform uniform values such as seq(0, 1, by=.05) into samples from another distribution using inverse functions like qnorm(seq(0, 1, by=.05), mean, sd) in R. But the computational burden and discretisation bias still remain.
Rather than exhaustively testing all combinations, we can switch to sampling-based strategies—like random (Monte Carlo) sampling, or Latin Hypercube Sampling (LHS)—to explore the parameter space more efficiently.
While grid searches become computationally infeasible for high-dimensional models, they can still be useful for didactic purposes, or to validate patterns seen in more complex global sensitivity analyses.
Random (Monte-Carlo) sampling
A natural improvement over brute-force grid search is random sampling, also known as the Monte Carlo approach.
Instead of exhaustively looping over every possible combination, we specify a fixed number of samples \(s\), and for each one, we randomly draw a value for every parameter from its specified distribution.
This approach offers immediate advantages:
- It breaks the rigid, repetitive structure of grid search.
- It allows us to explore any part of the parameter space—not just predefined grid points.
- It scales better as the number of parameters increases.
However, Monte Carlo sampling is still not perfect. As pointed out by (Bruch and Atwell 2015), it has its own inefficiencies:
- Clustering – Some regions of the parameter space may be sampled multiple times.
- Gaps – Other regions may be missed entirely.
- No structured coverage – There’s no guarantee that the entire space is sampled evenly or representatively.
This lack of structure becomes especially problematic in high-dimensional models, where the probability of missing important regions increases and computational efficiency becomes critical.
In short, random sampling is better than grid search—but still suboptimal when we care about efficient, uniform coverage of complex parameter spaces.
That’s where Latin Hypercube Sampling (LHS) comes in, offering a powerful balance between randomness and structure. We’ll explore LHS in the next section.
Latin Hypercube Sampling (LHS)
A more efficient and statistically robust method is Latin Hypercube Sampling (LHS), originally proposed by McKay, Beckman, and Conover (1979) in 1979. LHS is widely used for sensitivity and uncertainty analyses in agent-based modeling and beyond.
Latin Hypercube Sampling improves on both grid and Monte Carlo methods by ensuring full coverage of the parameter space with far fewer samples.
Here’s how it works:
Partition each parameter’s distribution into \(s\) equally probable intervals (called strata).
For example, with a uniform distribution from 0 to 1 and \(s = 10\), you divide it into intervals like \([0, 0.1), [0.1, 0.2), \dots, [0.9, 1.0)\).Randomly draw one value from each stratum without replacement. This ensures that all regions of the distribution are sampled.
Randomly combine the samples across parameters to form \(s\) unique parameter vectors for simulation.
Unlike grid search, LHS does not evaluate the Cartesian product of all parameter combinations. Instead, it produces a stratified random sample that spans the space evenly while dramatically reducing the number of runs needed.
As a rule of thumb, for \(k\) parameters you should use at least \(s = k + 1\) samples. In practice, larger values (e.g., \(s = 10,000\)) are preferred to obtain stable estimates of parameter influence.
LHS is particularly effective when:
- You want to ensure balanced coverage without the explosion of grid combinations,
- You’re planning a global sensitivity analysis (e.g., Sobol indices (comes later!)),
In the next section, we’ll show how to implement LHS in R, use it to generate parameter sets, and integrate it with your simulation workflow.
How Latin Hypercube Sampling Works — Step by Step
Let’s make this concrete with Figure 3. Suppose we have \(k = 3\) parameters—call them \(a\), \(b\), and \(c\)—and we choose \(s = 5\) samples.
- Each parameter distribution is divided into 5 strata (non-overlapping intervals of equal probability density).
- One value is drawn randomly from each stratum for each parameter:
- For \(a\): one value from each of the 5 intervals
- For \(b\): one value from each of the 5 intervals
- For \(c\): one value from each of the 5 intervals
- These samples are then shuffled independently across parameters and randomly combined into \(s = 5\) parameter vectors:
- Run 1 uses the 4th value from \(a\), the 2nd from \(b\), and the 1st from \(c\)
- Run 2 combines the 2nd from \(a\), 1st from \(b\), and 2nd from \(c\), and so on.
The resulting design matrix \(X\) (with dimensions \(s \times k\)) ensures that each region of every parameter’s distribution is covered exactly once, while combinations are still randomized to preserve diversity.
Independence assumption:
Standard Latin Hypercube Sampling (LHS) assumes that all parameters are independent of one another. This is often a reasonable simplification—but not always. If empirical data suggest that certain parameters are correlated (for example, infection probability might increase with network degree), then standard LHS can distort the parameter space.
In such cases, we need alternative LHS algorithms that can preserve correlations. For a more rigorous treatment of these issues, including how to quantify input dependencies and how they affect sensitivity results, see (Saltelli 2002; Saltelli et al. 2008; Marino et al. 2008).
Latin Hypercube Sampling strikes a balance between exploration and efficiency:
- More efficient than grid search, because it needs fewer simulations,,
- More structured than Monte Carlo, since it guarantees every part of the distribution is sampled.
It’s the preferred method for global sensitivity analysis in complex models like ours. In the next section, we’ll show how to use LHS in R, run simulations to identify which parameters matter most.
A Minimal Workflow in R
To put Latin Hypercube Sampling into practice in our simulation framework, we need to:
- Generate LHS samples,
- Transform those samples into meaningful parameter values,
- Feed them into the simulation.
Yet, we begin by loading the lhs package:
#install.packages("lhs")
library(lhs)The lhs package can generate basic Latin Hypercube samples with the function randomLHS(s, k), where:
sis the number of samples (rows),kis the number of parameters (columns).
However, model parameters rarely follow a uniform distribution on \([0, 1]\). Some may be normally distributed, others integers, or even categorical. To handle this flexibly, we use a custom helper function tbl_parameters_LHS() to transform the uniform draws into appropriate forms:
tbl_parameters_LHS <- function(parameters, s = 20, seed = NULL) {
# for replication, allow for seeding
if (!is.null(seed)) set.seed(seed)
# 1) generate the LHS matrix
LHS_matrix <- randomLHS(s , length(parameters))
# make it a df
tbl_parameters <- as.data.frame(LHS_matrix)
# 2) transform values
for (i in seq_along(parameters)){
if (parameters[[i]][[1]] == "qinteger") tbl_parameters[,i] <- qinteger(tbl_parameters[,i], parameters[[i]][[2]], parameters[[i]][[3]])
if (parameters[[i]][[1]] == "qnorm") tbl_parameters[,i] <- qnorm(tbl_parameters[,i], parameters[[i]][[2]], parameters[[i]][[3]])
if (parameters[[i]][[1]] == "qfactor") tbl_parameters[,i] <- qfactor(tbl_parameters[,i], parameters[[i]][[2]])
if (parameters[[i]][[1]] == "qdirichlet") tbl_parameters[,i] <- qdirichlet(tbl_parameters[,i], parameters[[i]][[2]])
if (parameters[[i]][[1]] == "qnumeric") tbl_parameters[,i] <- parameters[[i]][[2]] + (tbl_parameters[,i] * parameters[[i]][[3]])
}
# name them
names(tbl_parameters) <- names(parameters)
return(tbl_parameters)
}This function expects a parameters list, where each entry defines:
the distribution type (
"qnorm","qinteger","qnumeric", etc.),the distribution arguments (e.g., mean/sd for normal, min/max for integer).
Here’s a small example with three parameters:
parameters <- list( # provides random values in form of ...
world_size = list("qinteger", 21, 31), # ... an integer between 21 and 31
transmit_prob = list("qnumeric", 0.01, 0.5), # ... a numeric from the uniform distribution with values between 0.01 and 0.5
I_S_prob = list("qnorm", 0.125, 0.009) # ... a numeric from the normal distribution with mu = 0.125 and sd = 0.009
)
tbl_parameters_LHS(parameters, s = 5) world_size transmit_prob I_S_prob
1 27 0.42097076 0.1191957
2 30 0.07956799 0.1120731
3 23 0.34230381 0.1294996
4 26 0.15693698 0.1263874
5 24 0.28694423 0.1341209
This will generate a 5 × 3 table of parameter sets, ready to be passed into your simulation.
Now lets define a real set of parameters and plug this into our existing simulation structure. We already can use what we have learned about the parameters from our heatmap in order to reduce the parameter space:
parameters <- list( # provides random values in form of ...
transmit_prob = list("qnumeric", 0.25, 0.60), # ... a numeric from the uniform distribution with values between 0.01 and 0.5
I_R_prob = list("qnumeric", 0.00, 1.00), # ... a numeric from the uniform distribution with values between 0.01 and 0.5
I_S_prob = list("qnumeric", 0.00, 1.00) # ... a numeric from the normal distribution with mu = 0.125 and sd = 0.009
)
simulation <- list(
model = "week07_inclass",
replications = 15,
tbl_parameter = tbl_parameters_LHS(parameters, s = 100, seed = 123),
master_seed = 42,
max_cores = 15
)And run it:
#results <- run_simulation(simulation)
results <- readRDS(file = "../data/abm-diffusion3/week08_results_02.rds")In the next step, we’ll use this to conduct a global sensitivity analysis and determine which parameters most influence our outcomes.
Global Sensitivity Analysis
Now that we have a structured and flexible way to sample parameter values using Latin Hypercube Sampling, we’re ready to assess which parameters actually drive variation in the model outcome.
A first alternative is to treat the model as a black box and fit an ordinary least‐squares (OLS) regression from inputs to outputs. Our goal is to use regression coefficients (and their t–statistics) as a first-pass measure of which parameters matter most.
df <- tbl_parameters_LHS(parameters, s = 100, seed = 123)
df$total_death_rate <- results$df_final_summary$R_share_meanA plain OLS with the three inputs gives us an initial feel for direction and magnitude:
library(stargazer)
m1 <- lm(total_death_rate ~ transmit_prob + I_R_prob + I_S_prob, data = df)
stargazer(m1,
type = "html", # "html" or "latex" for Quarto docs
digits = 3,
title = "OLS Results – Main Effects",
star.cutoffs = c(.05, .01, .001) # customise significance stars
)
<table style="text-align:center"><caption><strong>OLS Results – Main Effects</strong></caption>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr>
<tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr>
<tr><td style="text-align:left"></td><td>total_death_rate</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">transmit_prob</td><td>1.486<sup>***</sup></td></tr>
<tr><td style="text-align:left"></td><td>(0.107)</td></tr>
<tr><td style="text-align:left"></td><td></td></tr>
<tr><td style="text-align:left">I_R_prob</td><td>-0.241<sup>***</sup></td></tr>
<tr><td style="text-align:left"></td><td>(0.065)</td></tr>
<tr><td style="text-align:left"></td><td></td></tr>
<tr><td style="text-align:left">I_S_prob</td><td>-0.154<sup>*</sup></td></tr>
<tr><td style="text-align:left"></td><td>(0.065)</td></tr>
<tr><td style="text-align:left"></td><td></td></tr>
<tr><td style="text-align:left">Constant</td><td>0.061</td></tr>
<tr><td style="text-align:left"></td><td>(0.078)</td></tr>
<tr><td style="text-align:left"></td><td></td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>100</td></tr>
<tr><td style="text-align:left">R<sup>2</sup></td><td>0.683</td></tr>
<tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.673</td></tr>
<tr><td style="text-align:left">Residual Std. Error</td><td>0.186 (df = 96)</td></tr>
<tr><td style="text-align:left">F Statistic</td><td>69.037<sup>***</sup> (df = 3; 96)</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.05; <sup>**</sup>p<0.01; <sup>***</sup>p<0.001</td></tr>
</table>
The resulting model already tells us a lot. We observe an \(R^2\) of about 0.68, meaning the three input parameters explain nearly 70% of the variance in the outcome. This is already a substantial amount for such a simple model. As expected, both I_R_prob and I_S_prob have negative coefficients. This makes intuitive sense: the higher the probability that agents are removed or recover, the lower the overall death rate in the model. The transmission probability transmit_prob, on the other hand, has a positive and significant effect, increasing the number of infections and, eventually, deaths.
To probe deeper, we add interaction terms between the three predictors and fit a second model.
m2 <- lm(total_death_rate ~ transmit_prob * I_R_prob * I_S_prob, data = df)
stargazer(m2,
type = "html", # "html" or "latex" for Quarto docs
digits = 3,
title = "OLS Results – Main Effects + Interaction Effects",
star.cutoffs = c(.05, .01, .001) # customise significance stars
)
<table style="text-align:center"><caption><strong>OLS Results – Main Effects + Interaction Effects</strong></caption>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr>
<tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr>
<tr><td style="text-align:left"></td><td>total_death_rate</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">transmit_prob</td><td>0.077</td></tr>
<tr><td style="text-align:left"></td><td>(0.364)</td></tr>
<tr><td style="text-align:left"></td><td></td></tr>
<tr><td style="text-align:left">I_R_prob</td><td>-1.782<sup>***</sup></td></tr>
<tr><td style="text-align:left"></td><td>(0.366)</td></tr>
<tr><td style="text-align:left"></td><td></td></tr>
<tr><td style="text-align:left">I_S_prob</td><td>-0.859<sup>*</sup></td></tr>
<tr><td style="text-align:left"></td><td>(0.365)</td></tr>
<tr><td style="text-align:left"></td><td></td></tr>
<tr><td style="text-align:left">transmit_prob:I_R_prob</td><td>2.409<sup>***</sup></td></tr>
<tr><td style="text-align:left"></td><td>(0.643)</td></tr>
<tr><td style="text-align:left"></td><td></td></tr>
<tr><td style="text-align:left">transmit_prob:I_S_prob</td><td>0.791</td></tr>
<tr><td style="text-align:left"></td><td>(0.626)</td></tr>
<tr><td style="text-align:left"></td><td></td></tr>
<tr><td style="text-align:left">I_R_prob:I_S_prob</td><td>0.910</td></tr>
<tr><td style="text-align:left"></td><td>(0.655)</td></tr>
<tr><td style="text-align:left"></td><td></td></tr>
<tr><td style="text-align:left">transmit_prob:I_R_prob:I_S_prob</td><td>-0.523</td></tr>
<tr><td style="text-align:left"></td><td>(1.194)</td></tr>
<tr><td style="text-align:left"></td><td></td></tr>
<tr><td style="text-align:left">Constant</td><td>0.923<sup>***</sup></td></tr>
<tr><td style="text-align:left"></td><td>(0.211)</td></tr>
<tr><td style="text-align:left"></td><td></td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>100</td></tr>
<tr><td style="text-align:left">R<sup>2</sup></td><td>0.795</td></tr>
<tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.779</td></tr>
<tr><td style="text-align:left">Residual Std. Error</td><td>0.153 (df = 92)</td></tr>
<tr><td style="text-align:left">F Statistic</td><td>50.943<sup>***</sup> (df = 7; 92)</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.05; <sup>**</sup>p<0.01; <sup>***</sup>p<0.001</td></tr>
</table>
This second model increases the explanatory power further, raising the \(R^2\) to about 0.80. That jump suggests that interactions between the parameters are important to nearly fully capture the system’s dynamics. Interestingly, once interactions are accounted for, the main effect of transmit_prob becomes non-significant. Instead, the interaction between transmit_prob and I_R_prob becomes a strong positive predictor of the outcome. In other words, it’s not just that transmission or removal alone matter, but that their combination has a particularly strong effect—something we already saw visually in the earlier heatmap Figure 1.
OLS provides an excellent starting point for global sensitivity analysis. It’s fast, easy to interpret, and gives clear guidance on which parameters are worth investigating further. However, we must be cautious in interpreting the results too literally. The reliability of OLS depends on the assumption that relationships are linear (or at least monotonic) within the sampled parameter space. If the model exhibits threshold effects, tipping points, or strong curvature, these patterns may be obscured in a linear approximation. Multicollinearity between predictors can also mask their individual effects.
In sum, OLS regression serves as a useful diagnostic tool. It shows us where to focus attention, which parameters are likely to matter, and which may be safely ignored in early stages. But for a more comprehensive and non-parametric picture of sensitivity, variance-based methods such as Sobol indices are still the gold standard. For a clear and accessible comparison of these methods—including practical advice for implementation—see Ten Broeke, Van Voorn, and Ligtenberg (2016).
Model uncertainty
While input uncertainty concerns the parameters fed into the model, model uncertainty arises from the architecture of the simulation itself—how the model is implemented. This includes technical and design decisions that, while often made for pragmatic reasons, can have unintended effects on the outcome.
In contrast to input uncertainty, there is no systematic, widely accepted procedure for assessing model uncertainty. Instead, it falls on the researcher to identify and test architectural assumptions that could matter.
Some typical examples include:
- Order of operations: In what sequence are state transitions executed?
- Tie-breaking rules: What happens when agents face equally attractive options? (e.g., \(EU(A) = EU(B)\))
- Random draws: Are you using uniform, normal, or empirically derived distributions?
- Functional forms: Are behaviors triggered by thresholds or governed by probabilities (e.g., logistic functions)?
- Interaction structures: Are agents placed on a grid, a random network, or a real-world social graph?
- Stochastic sequencing: In what order are agents updated, and are those updates synchronous or asynchronous?
- Floating-point precision: How do rounding errors affect outcomes? (Pilat, Suzuki, and Arita 2012)
Example from Our Simulation
In our current implementation of run_step(), we follow a fixed sequence that looks like this:
- Save the list of currently infectious agents
- Apply a series of transitions:
- (1A) Infectious → Removed
- (1B) Removed → Susceptible
- (1C) Infectious → Susceptible
- (1D) Susceptible → Infectious
- Spread infection from the initially infectious agents
Although steps 1A through 1D are all part of the same time step, they occur in a strict sequence. This means an agent could, within a single step:
- Move from Infectious to Removed (1A),
- Then from Removed to Susceptible (1B),
- And potentially back to Infectious again via infection (1D).
This raises a critical point: model outcomes can be influenced by arbitrary architectural decisions, such as which transitions are applied first. Given that, take a moment to think about the following question:
What is the probability that a specific agent who is
Infectiousat the beginning of the step becomesRemovedat the end of the step (lets call it \(p_{IR}\))?
You might think it’s just I_R_prob—and you’d be close—but not quite right. This is because agents who become Removed in step 1A may immediately transition to Susceptible in step 1B. So the probability of remaining in state R at the end of the step is:
\[ p_{IR} &= \text{I_R_prob} \cdot (1 - \text{R_S_prob}) \]
In a similar way, we can derive the probability of transitioning to Susceptible:
\[
p_{IS} &= (1 - \text{I_R_prob}) \cdot \text{I_S_prob} + \text{I_R_prob} \cdot \text{R_S_prob} \\
\] And since there are only three possible destinations (S, I, or R), the remaining probability is:
\[ p_{II} = 1 - p_{IR} - p_{IS} \]
These probabilities are not encoded explicitly in a transition matrix. Instead, they emerge from the procedural logic of the run_step() function, which makes it harder to verify or reason about them systematically. But logically, there’s no strong justification for performing it the way we do. What if we swapped the order?
To explore whether this matters, I reran the simulation with steps 1A and 1C reversed, keeping everything else the same. The results are plotted below.
results <- readRDS("../data/abm-diffusion3/week08_results_03.rds")
plot_SIR_shares_interactive(
df = results$df_final_summary,
x_var = "transmit_prob",
y_var = "I_R_prob"
)In this specific setup, the change in order did not drastically affect the results. However, that doesn’t guarantee stability across all parameter combinations. In some regimes, architectural differences like these could introduce major differences.
Toward a More Robust Implementation
One effective way to reduce model uncertainty is to introduce a clearer and more controlled transition structure. Specifically, we could adopt two design principles:
- Each agent is allowed to change state only once per time step (i.e., during step 1), and
- All transitions from a given state are governed by an explicit probability vector—for example:
\[ p_{IS} + p_{II} + p_{IR} = 1 \]
In this case, the probabilities would be defined directly, rather than emerging from the sequence of conditional checks in the code. This structure makes the model easier to understand, easier to test, and more transparent—since the meaning of each probability is self-contained and not entangled with execution order.
In the same way, we could define analogous transition vectors for other states (e.g., S, R), building a complete and interpretable transition matrix.
Important caveat:
If we have good theoretical or empirical reasons to model transitions as sequential (e.g., agents can change state more than once per step), that’s perfectly valid. But if we don’t, the simplified approach described here avoids ambiguity and supports cleaner reasoning.
Stochastic Variability
Even if the model structure and parameters are fixed, randomness built into the simulation process can still lead to different outcomes across runs. This type of uncertainty—stochastic variability—comes from:
- Random initial conditions (e.g., which agents are infected at the start),
- Random behavioral decisions (e.g., who contacts whom),
- Random events (e.g., transmission success or failure).
Since outcomes can vary just by chance, replication is essential. By running the same model multiple times, we can estimate how much of the observed outcome is due to randomness versus stable underlying dynamics.
In our simulations, we handle this with the run_replication() function. Each parameter set is evaluated multiple times, and the results are aggregated (e.g., by computing the mean, median, or standard deviation of outcome variables like R_share or peak_infectious).
The appropriate number of replications depends on several factors:
- Model sensitivity to randomness,
- Variance of the outcome variable,
- Desired precision, and
- Available computing resources.
Rule of thumb: If your model is deterministic, one sequence is enough. If not, the more replications you run, the better your estimate of average behavior—and its variability.
Yet, relying only on average outcomes can be misleading. Take a look at the two plots below in Figure 5. In both cases, we simulated the same ABM for a disease and measured the number of infectious agents over time. Each was run 10,000 times. The mean outcome is the same in both panels, but the distributions tell very different stories:
In Figure A, the sharp spike at 0 shows that in nearly half the replications, the disease went extinct immediately. In Figure B, most runs produced a sizable outbreak. Looking only at the mean would hide this crucial difference.
Always inspect the variance, standard deviation, or distribution of outcomes—not just point estimates.
How Many Replications Do I Need?
There is no universal rule for the number of replications required—it depends on:
- Model complexity,
- Desired statistical confidence,
- and Outcome variance.
But it also depends on model design. Consider an iterated Prisoner’s Dilemma model with memory = 1. There were 32 possible strategies. If we simulate small populations (e.g., 10 agents), a random draw might include only a few of them. Without mutation, the outcome depends heavily on which strategies happened to be included.
In contrast, in a large population (e.g., 1 million agents), the law of large numbers ensures that almost all strategies are present in roughly equal proportions. In such cases, one run might already reflect the long-term average behavior—so we may need fewer replications.
Sometimes the best way to reduce variability is through better model design. If you’re not specifically interested in how initial distributions shape dynamics, use a large enough world size that randomness at the start becomes negligible. In such cases, spending more time per run may let you spend less time overall, because you’ll need fewer replications to reach reliable conclusions.
Adding Some Realism
To wrap up this week’s material, we’ll nudge our toy model a bit closer to the messy world it is meant to illuminate. Agent-based models only become genuinely useful for experimentation when their ingredients resemble reality. That doesn’t mean we need a photo-realistic replica of society—but each iteration should move us in that direction.
Today’s upgrades are deliberately modest: We’ll swap hand-wavy guesses for rule-of-thumb estimates grounded in data or domain knowledge.
More Realistic Parameters
Lets take a look at the recovery probability. Let’s now add more realism to our model by making parameter choices that are grounded in empirical assumptions. Suppose we assume that an infected person remains sick for an average of 7 days. In our simulation this recovery process is be modeled as a Bernoulli trial, where in each time step (say, each day), the agent recovers with probability I_S_prob.
The expected number of steps until recovery in such a process is:
\[ \mathbb{E}[\text{duration}] = \frac{1}{\text{I_S_prob}} \]
Solving for I_S_prob when the average duration is 7 days:
\[ \text{I_S_prob} = \frac{1}{7} \approx 0.143 \]
Now, suppose we also want some variability in sickness duration, but not too much—say, people should typically recover within ±1 day of the average. That is, most sickness durations should fall between 6 and 8 days.
This implies that the recovery probabilities should vary between:
\[ \frac{1}{8} = 0.125 \quad \text{and} \quad \frac{1}{6} \approx 0.167 \]
To capture this range statistically, we want roughly 95% of our values (i.e., ±2 standard deviations) to lie within this interval. That gives us:
\[ \sigma = \frac{0.143 - 0.125}{2} \approx 0.009 \]
So the standard deviation should be about 0.009 to reflect a reasonable spread of recovery probabilities between simulation runs.
This of course only holds true if we ignore other probabilities that influence the length of the infection such as I_R_prob. Another way would to count the average number of days a agent is infected an to choose the I_S_prob in such a way, that it correspond to our empirical observation of 7 days. But we will talk about this method in more detail in a later chapter.
🧩 Conclusion
🏋 Pre-Class Assignment
Before the next session, take some time to engage with the model hands-on.
[Medium]
Try to identify parameter values that are grounded in real-world knowledge or plausible assumptions. You can use domain expertise, quick literature searches, or even rough estimates.
Reflect on this: What kind of interpretation or real-world behavior do these parameters represent? And how might they affect the system’s dynamics?[Medium]
Formulate a small, focused research question you’d like to explore with the model.
Then, set up a Latin Hypercube Sample (LHS) to investigate how the model behaves under different parameter combinations.
Run the simulation, collect the output, and analyze it.
What does your analysis suggest about which parameters influence the outcome most?
🚀 Link to In-Class Material
Here is the link for the in-class material.